91 research outputs found
Knowledge Transfer with Jacobian Matching
Classical distillation methods transfer representations from a "teacher"
neural network to a "student" network by matching their output activations.
Recent methods also match the Jacobians, or the gradient of output activations
with the input. However, this involves making some ad hoc decisions, in
particular, the choice of the loss function.
In this paper, we first establish an equivalence between Jacobian matching
and distillation with input noise, from which we derive appropriate loss
functions for Jacobian matching. We then rely on this analysis to apply
Jacobian matching to transfer learning by establishing equivalence of a recent
transfer learning procedure to distillation.
We then show experimentally on standard image datasets that Jacobian-based
penalties improve distillation, robustness to noisy inputs, and transfer
learning
Fair Latency-Aware Metric for real-time video segmentation networks
As supervised semantic segmentation is reaching satisfying results, many
recent papers focused on making segmentation network architectures faster,
smaller and more efficient. In particular, studies often aim to reach the stage
to which they can claim to be "real-time". Achieving this goal is especially
relevant in the context of real-time video operations for autonomous vehicles
and robots, or medical imaging during surgery.
The common metric used for assessing these methods is so far the same as the
ones used for image segmentation without time constraint: mean Intersection
over Union (mIoU). In this paper, we argue that this metric is not relevant
enough for real-time video as it does not take into account the processing time
(latency) of the network. We propose a similar but more relevant metric called
FLAME for video-segmentation networks, that compares the output segmentation of
the network with the ground truth segmentation of the current video frame at
the time when the network finishes the processing.
We perform experiments to compare a few networks using this metric and
propose a simple addition to network training to enhance results according to
that metric
SequeL: A Continual Learning Library in PyTorch and JAX
Continual Learning is an important and challenging problem in machine
learning, where models must adapt to a continuous stream of new data without
forgetting previously acquired knowledge. While existing frameworks are built
on PyTorch, the rising popularity of JAX might lead to divergent codebases,
ultimately hindering reproducibility and progress. To address this problem, we
introduce SequeL, a flexible and extensible library for Continual Learning that
supports both PyTorch and JAX frameworks. SequeL provides a unified interface
for a wide range of Continual Learning algorithms, including
regularization-based approaches, replay-based approaches, and hybrid
approaches. The library is designed towards modularity and simplicity, making
the API suitable for both researchers and practitioners. We release
SequeL\footnote{\url{https://github.com/nik-dim/sequel}} as an open-source
library, enabling researchers and developers to easily experiment and extend
the library for their own purposes.Comment: 7 pages, 1 figure, 4 code listing
Taming GANs with Lookahead
Generative Adversarial Networks are notoriously challenging to train. The
underlying minimax optimization is highly susceptible to the variance of the
stochastic gradient and the rotational component of the associated game vector
field. We empirically demonstrate the effectiveness of the Lookahead
meta-optimization method for optimizing games, originally proposed for standard
minimization. The backtracking step of Lookahead naturally handles the
rotational game dynamics, which in turn enables the gradient ascent descent
method to converge on challenging toy games often analyzed in the literature.
Moreover, it implicitly handles high variance without using large mini-batches,
known to be essential for reaching state of the art performance. Experimental
results on MNIST, SVHN, and CIFAR-10, demonstrate a clear advantage of
combining Lookahead with Adam or extragradient, in terms of performance, memory
footprint, and improved stability. Using 30-fold fewer parameters and 16-fold
smaller minibatches we outperform the reported performance of the
class-dependent BigGAN on CIFAR-10 by obtaining FID of \emph{without}
using the class labels, bringing state-of-the-art GAN training within reach of
common computational resources
Geometric calibration of Colour and Stereo Surface Imaging System of ESA's Trace Gas Orbiter
There are many geometric calibration methods for "standard" cameras. These
methods, however, cannot be used for the calibration of telescopes with large
focal lengths and complex off-axis optics. Moreover, specialized calibration
methods for the telescopes are scarce in literature. We describe the
calibration method that we developed for the Colour and Stereo Surface Imaging
System (CaSSIS) telescope, on board of the ExoMars Trace Gas Orbiter (TGO).
Although our method is described in the context of CaSSIS, with camera-specific
experiments, it is general and can be applied to other telescopes. We further
encourage re-use of the proposed method by making our calibration code and data
available on-line.Comment: Submitted to Advances in Space Researc
Tasting Families of Features for Image Classification
Using multiple families of image features is a very efficient strategy to improve performance in object detection or recognition. However, such a strategy induces multiple challenges for machine learning methods, both from a computational and a statistical perspective. The main contribution of this paper is a novel feature sampling procedure dubbed “Tasting” to improve the efficiency of Boosting in such a context. Instead of sampling features in a uniform manner, Tasting continuously estimates the expected loss reduction for each family from a limited set of features sampled prior to the learning, and biases the sampling accordingly. We evaluate the performance of this procedure with tens of families of features on four image classification and object detection data-sets. We show that Tasting, which does not require the tuning of any meta-parameter, outperforms systematically variants of uniform sampling and state-of-the-art approaches based on bandit strategies
- …